Structural Analysis of Retrovirus Assembly and Maturation

Retroviruses have a very complex and tightly controlled life cycle which has been studied intensely for decades. After a virus enters the cell, it reverse-transcribes its genome, which is then integrated into the host genome, and subsequently all structural and regulatory proteins are transcribed and translated. The proteins, along with the viral genome, assemble into a new virion, which buds off the host cell and matures into a newly infectious virion. If any one of these steps are faulty, the virus cannot produce infectious viral progeny. Recent advances in structural and molecular techniques have made it possible to better understand this class of viruses, including details about how they regulate and coordinate the different steps of the virus life cycle. In this review we summarize the molecular analysis of the assembly and maturation steps of the life cycle by providing an overview on structural and biochemical studies to understand these processes. We also outline the differences between various retrovirus families with regards to these processes.


Introduction
Retroviruses are positive RNA viruses; many of those cause diseases of major importance to humans and domestic animals. Their name comes from their ability to reverse transcribe their genome and insert it into the host cell. The retrovirus family can be divided into several subfamilies based on their nucleic acid sequences and their life cycles. Alpha (e.g., Rous Sarcoma Virus-RSV), Beta (e.g., Human Endogenous Retrovirus K-HERV-K), Gamma (e.g., Murine Leukaemia Virus-MLV), Delta (e.g., Human T-cell Leukaemia Virus-HTLV), Epsilon-retroviruses (e.g., Walleye Dermal Sarcoma Virus-WDSV), Lentiviruses (e.g., Human Immunodeficiency Virus-HIV), and Spumaretroviruses (e.g., Human Foamy Virus-HFV). All subfamilies encode the open reading frames gag, pol and env, but differences in their genomic organization and additional regulatory proteins make each family unique.

Gag Is the Driver of Retroviral Assembly
The polyprotein Gag is the main driver of retrovirus assembly. Gag is synthetized as a long polyprotein upon the translation of the unspliced viral RNA (vRNA). Apart from Spumaviruses, Gag is cleaved by the viral protease during viral maturation to form the infectious and mature viral core. Gag consists of three conserved major protein domains: Matrix protein (MA), which targets Gag to the viral assembly sites, Capsid protein (CA), which is responsible for the multimerization necessary for assembly of both the immature and mature virus, and Nucleocapsid (NC), which binds the genomic RNA (gRNA) (Figure 1d).

Immature Gag Lattice Structure
Apart from Spumaviruses, all retroviruses share a common morphology of the immature particle, with a dense, roughly spherical protein shell which stretches across approximately two-thirds of the internal surface of the virion (Figure 1a) [15]. The immature virus is formed exclusively by Gag hexamers that form a lattice through interactions of the CA domains as well as between SP1 or equivalent domains. An 8 nm spacing between hexamers is also a conserved feature of immature retroviral lattices [16][17][18][19]. However, a lattice consisting of only hexamers is mostly flat. To allow for curvature, it must incorporate holes or defects to fit onto a spherical surface [20]. The CA protein itself is divided into two subdomains: N-terminal (NTD) and C-terminal domains (CTD) separated by a flexible linker region. Despite low sequence conservation, the CA tertiary structure is highly conserved among retroviruses. The NTD generally consists of seven alpha helices, whereas the CTD is made up of four alpha-helices. These two domains are then joined by a flexible linker and oriented in such a way that they can form interactions to stabilize the hexamer [21]. The hexamer is the main building-block of the immature virus lattice. Intra-hexamer interactions are performed by a six-helix bundle (6HB) at the end of CTD and a spacer peptide downstream of capsid. Inter-hexamer interactions are performed by dimerization and trimerization of CA to form a lattice.
As the immature lattice is quite heterogenous in size and morphology, resolving its complete structure is difficult. Some domains of HIV-1 Gag were solved individually in earlier studies by X-ray crystallography and NMR [22][23][24]. Recently, the structure of HIV-1 CA CTD and the spacer peptide that recapitulates the immature lattice has been solved by X-ray crystallography [25]. For the structures of native Gag immature lattice assemblies, cryoET and subtomogram averaging (STA) provides a very powerful method [19,[26][27][28][29], allowing for in situ structures to reach a resolution of~3 Å [26]. These studies have shown that the hexameric assembly unit of immature CA lattice takes the shape of a wine glass, with the cup walls formed by the NTD, the cup bottom by the CTD, the stem by the 6HB formed by the last residues of CA and SP1 and the base by the amorphous NC/vRNA layer (Figure 1b) [30]. The neighbouring Gag hexamers are inter-connected to form a hexagonal lattice through trimeric interactions in the corners (blue circle). Three hexamers join, causing dimeric interactions (red circle) on the sides of the hexamer (cartoon overview Figure 1c). Helix 2 is responsible for the trimeric interactions between hexamers and the top of helix 4 in the NTD interacts with helices 5 and 6 in neighbouring hexamers ( Figure 1c). In the CTD, the major homology region (MHR) is used for interactions within hexamers and the helix 9 contributes to the dimer interface between hexamers [18,31]. In contrast to the mature core, the Gag domains are organized in a linear fashion, meaning there are no contacts between NTDs and CTDs in immature lattices. Most viruses solved to date, such as HIV, RSV, EIAV and MLV, have the dimerization domain located in the CA CTD helix 9, while for HTLV-1 it seems to be in the NTD domain [17][18][19]32,33]. This unique feature may explain why HTLV-1 is the only retrovirus to display an immature lattice with straight facets [34].
A very important region within the immature lattice is the 6HB at the end of CA CTD and the beginning of SP1 (or a corresponding spacer peptide). This is the region where maturation inhibitors, such as Bevirimat (BVM) and PF46396 (PF96), bind [35,36]. In addition, a small molecule from the host cell, inositol hexaphosphate (IP6), was recently discovered to bind to the highly positive charged region at the top of 6HB and acts as a conserved assembly co-factor for different retroviruses, critical for the immature lattice assembly and viral infectivity [27,37]. The MHR, the loop between helix 9 and 10, and the beta-turn at the end of helix 11 further stabilize the 6HB. The 6HB could also be found in different retroviruses, such as MLV, EIAV and RSV [32,38]. NMR and molecular dynamic simulations have shown that this helix bundle is under a dynamic helix-coil equilibrium and that this is an important feature for optimal maturation and infectivity [39]. A second site compensatory mutation in the space peptide (T8I in HIV-1) in the BVM resistant mutant was shown to alter the dynamic property of 6HB and enable the formation of a fully extended stable 6HB connecting to the NC/vRNA layer [27]. Although all retroviral immature lattices solved to date share a hexameric immature lattice arrangement, the arrangement of CA NTD and its protein contacts differ between them, while the CA CTD interactions are highly conserved [16][17][18][19].  [15] and HTLV-1 (right) [15] virus-like particles, black arrowheads point to Gag lattice. Scale bars, 50 nm. (b) HIV-1 immature Gag structure, shown as cryoET subtomoram averaging map (left) and atomic model of the hexamer (right). One hexamer is highlighted in yellow in cryoET STA map within which one monomer is colored in blue (NTD) and orange (CTD and SP1). Grey outline shows the wine glass profile of the Gag hexamer [27]. (c) HIV-1 immature lattice organization in top view. Details of the dimeric, trimeric and hexameric interfaces are shown in gold, blue and green dashes, respectively. (d) HIV-1 Gag polyprotein domain organization. Original PDB accession codes: 1UPH (MA) [5], 7ASH (CA and SP1) [27], 1F6U (NC and SL2 RNA loop) [40], 2C55 (p6) [41].

Retroviral Maturation Is a Finely Regulated Process
The budded immature particles are non-infectious. During, or shortly after budding, the viral protease is activated and starts cleaving the viral polyproteins into their individual constituents, which subsequently assemble into the infectious mature virion. The consequence of maturation is reflected in three aspects: at the sequence level, it is cleaved at all sites between structural proteins and spacer peptides by the protease; at the structure level, it undergoes structural rearrangements between CA subdomains and NC, vRNA and integrase (IN); at the architectural level, it converts from a spherical immature core only containing Gag hexamers to a mature conical shell containing CA hexamers and pentamers enclosing a condensed ribonucleoprotein complex (RNP).
The sequence and kinetics of cleavage is determined by the affinities between the cleavage sites and protease active site, as well as the accessibility of the cleavage sites. Any event that alters the sequence, rate or timing of maturation has drastic effects on viral morphology and infectivity [42]. In HIV-1, biochemical assays have shown that the first cleavage site within Gag to be processed is at the boundary of spacer peptide 1 (SP1) and NC. This cleavage releases NC and vRNA to condense into the RNP. The subsequent cleavages happen at the SP2-p6 and MA-CA boundaries, as well as at the NC-SP2 site. Finally, the last cleavage separates CA from SP1. This cleavage destabilizes the CA-SP1 6HB and was termed a maturation switch responsible for the large architectural rearrangement that converts the spherical immature assembly into a conical mature core [43].
The cleavage of viral polyproteins into individual domains is performed by the viral PR. The viral PR is itself part of the GagPol (or GagProPol, depending on the retroviral subfamily) polyprotein, present at a ratio of 1:20 to 1:10 in relation to the polyprotein Gag. The recruitment of GagPol to the viral assembly sites is thought to occur through the same Gag-Gag interactions that build the immature virus. Moreover, the viral PR is active only as a homodimer. It is not clear yet what is the trigger of viral PR activation, but it stands to reason that the first initial step in PR activation must be the dimerization of two PR domains within two GagPol polyproteins. PR dimerization may also be facilitated by the dimerization of RT and IN domains situated downstream of PR, as artificially enhancing or impairing RT and IN dimerization causes a corresponding effect on PR, with either premature or delayed activation [44][45][46]. In vitro cleavage studies have shown that a PR homodimer embedded in GagPol has much lower catalytic activity than 'free' PR homodimer. It is also suggested that the PR dimer in GagPol is constantly sampling different conformations and that the correct and active homodimer conformation is achieved only 5% of the time [47]. The initial cleavage events in maturation occur in an intramolecular fashion, where a GagPol-embedded PR dimer cleaves sites present in the same polyprotein that contains it. The initial sites to be cleaved in GagPol are on the SP1-NC boundary (also the initial cleavage in Gag) within an internal site in p6*, a domain of GagPol that sits upstream of PR and that is created by the frameshift event that leads to GagPol synthesis [48]. It is thought that p6* is an important spatiotemporal regulator of PR activation [49]. Accessory viral proteins such as Vif and Nef have also been described as regulators of PR activity [50,51]. The subsequent intramolecular cleavage occurs at the boundary of p6* and PR frees the PR N-terminal end. This is accompanied by an increase in PR catalytic activity, which is a consequence of better PR folding, particularly the formation of the four stranded beta-sheet that stabilizes the PR homodimer [52]. The cleavage of the GagPol domains downstream of PR, as well as the cleavage of Gag and other viral proteins, is thought to proceed in an intermolecular fashion [48]. Although it is not known what triggers PR activation, this is clearly a highly regulated (spatially and temporally) and sensitive event. Attempts to pause PR activation and synchronize maturation led to gross structural defects in the resulting viral particles, even though proteolytic cleavage of the polyproteins proceeded successfully once the PR pause was lifted [42]. This suggests that not only the cleavage events need to happen in the right sequence and rate, but also that maturation needs to be coordinated with other late viral replication events, probably assembly and budding.
The radical rearrangement of the virus architecture that most profoundly marks appropriate completion of HIV-1 maturation is the enclosure of the RNP inside a conical capsid. This affects the organization of multiple viral components: MA, CA, NC/vRNA, IN and Env. For HIV, recent cryoET findings have shown that MA trimers form a sparse and poorly ordered hexagonal lattice in the immature virus. Upon maturation, which curiously does not depend on proteolytic cleavage between MA-CA, this lattice becomes more ordered and tightly packed via MA highly basic region, with the inter-MA trimer interfaces completely different from the immature lattice [53]. The charge of the central pore in the hexamer of MA trimers also changes upon maturation, from basic-charged to neutral, which may have consequences on Env cytoplasmic tail interactions [53]. Binding of genomic RNA by the IN is critical for HIV-1 maturation [54]. Inhibition of IN-RNA interactions resulted in mislocalization of the RNP to the exterior of the mature capsid, as shown by the effects of HIV-1 class II IN mutations, as well as of Allosteric IN inhibitors (ALLINIs) on HIV-1 maturation [55]. It has been reported that Env arrangement is also affected during maturation, with Env trimers changing from a scattered and low-mobile state in the immature particle to a highly mobile and clustered state in the mature virus [56].
There have been three hypotheses for how the architectural maturation (the switch from spherical immature lattice to a conical mature lattice) takes place: displacive transition, de novo assembly, and sequential combination of displacive nucleation followed by de novo assembly. The displacive model considers that the CA rearrangement occurs concomitantly with maturation, and that a portion of the cleaved lattice rolls away from the membrane while associated with the condensing RNP to form a mature conical core [57]. The displacive model is inspired by in vitro studies on non-diffusional transitions of tubular CA-NC assemblies when PR is added to the system [58]. The de novo assembly model postulates that, upon cleavage, the immature lattice is completely disassembled and a subset of the now soluble CA proteins re-oligomerize to form the mature core. This model is supported by the observation that the CA subdomain re-orientation and novel proteinprotein contacts necessary to transition from the immature lattice to the mature lattice cannot be accommodated in the whole virus lattice due to spatial constraints. The third model is a combination of the previous two and suggests that the initial mature lattice nucleation step happens by a displacive mechanism that has a large contribution from the RNP condensation. The expansion of the mature lattice into an enclosed core happens by a de novo mechanism starting from the previously displacive-originated mature lattice. This has been supported by in vitro maturation studies combining parallel biochemical assays, cryoEM and computation modelling [59].
There are also competing hypotheses on the directionality of the HIV-1 mature core assembly. It was initially proposed that the core assembly proceeded from the base to the tip of the cone. This was supported by some cryoET observations: the HIV cone has a consistent cone angle between 18-24 • ; the cone wide base has a consistent 11 nm distance to the viral membrane; HIV RNP is located at the bottom of the conical core. The authors postulated that the RNP had a big role in nucleating the CA mature lattice and that the cone growth started there until it reached the other end of the viral membrane. It was also observed that the tip end of the cone frequently had a hole, postulated as a cone-closing defect [60]. A competing hypothesis proposes the opposite, that the core assembles from the tip to the base of the cone. This is also supported by cryoET observations, namely that the HIV cone spanned the whole diameter of the viral particle, even when viral particles varied in diameter. As such, a cone that grew from the tip towards the wide end would grow until it reached the opposing side of the viral membrane, at which point the membrane resistance would force the growing facet to bend (by the insertion of CA pentamers) until it closed at the base [61]. The third hypothesis for HIV-1 core assembly stems from the combination of cryoET and computer simulations of the nonequilibrium growth of elastic sheets. This third model proposed that core formation proceeds from a mature CA lattice sheet with the tendency to curve. This intrinsic curving propensity is given by the nature of the CA unit, which can be approximated to a tapered prism-shaped 3D subunit. The growth of the core started with the polymerization of a curved lattice sheet, in which the insertion of pentamers was necessary in order to resolve high-curvature regions. Eventually, the growing CA sheet would curl and meet itself on the other end, closing the core. By modulating the simulation parameters, this model could recapitulate all shapes found in retroviral capsids, from polyhedral, to cones and tubes. It also recapitulated many core defects found in conical cores, such as jelly-rolls and closing gaps, both at the side of the cone and at the narrow tip of the core [62,63].

Mature Core Lattice Structure
Upon maturation, a subset of the CA protein assembles to form the mature core (Figure 2a). The mature core is a metastable structure responsible for protecting the viral genome from detection by cytoplasmic host innate immunity sensors, as well as to provide a compartment for reverse transcription initiation. The capsid core must also be able to uncoat to release the reverse transcribed genome for integration in the host genome. The capsid further acts as a docking platform for host proteins to facilitate the transport of virus core through the cytoplasm into the nucleus [64][65][66]. Many host restriction factors also recognize the mature capsid surface lattice and evoke inhibitory actions [67][68][69][70].
The mature retrovirus capsid core follows the fullerene geometry model. It is predominantly made up of hexamers (Figure 2b) and incorporates a few pentamers. These pentamers are necessary, as they allow for high curvature in the lattice. For lentiviruses, this core has a conical shape. For alpha, beta, gamma, and deltaretroviruses, this core is polyhedral or cylindrical in shape. The overall core shape is determined by the placement of the pentamers in the core. In a conical core, this is done by a partition of seven pentamers in the broad cone end, and five at the narrow end. A 6-6 partition leads to a cylindrical core, while a randomly distributed partition creates a polyhedron. There are also viruses which incorporate more than 12 pentamers, such as MLV. MLV has multi-layered or multiple cores as it incorporates almost all cleaved CA molecules, and the core is less tightly packed. The various MLV core morphologies require up to 24 pentamers but they are remarkably similar in structure to their neighbouring hexamers [17]. HTLV-1 mature cores are very often incomplete, which may explain the remarkably low infectivity of this retrovirus [71].
In HIV-1, the mature lattice has a thinner appearance (4 nm thick) in the cross section in comparison with the immature one (14 nm thick from CA NTD to NC/vRNA). The hexamer-hexamer spacing is larger than in the immature lattice, 10 nm instead of 8 nm.
The key differences between immature and mature hexamers are the orientations of NTD and CTD and the intermolecular NTD-CTD contacts that stabilize the mature hexamer (Figure 2b,c). For the mature lattice assembly, the CTD is the main stabilizing domain between hexamers, specifically the 2-fold interface formed by helix 9 and the 3-fold interface formed by helices 10 and 11 ( Figure 2d) [72,73]. The dimer interface appears stable and less flexible than the trimer interface, as suggested by the lower B-factor (Figure 2d). An important interface is formed by conserved positively charged residues at the center of the NTD hexamer. This position is reported to bind IP6 in an analogous manner as the positively charged ring found in the CTD at the top of 6HB in the immature lattice [37,74]. In HIV-1, above the mature hexamer center lies a beta-hairpin described to adopt different conformations depending on pH [75]. These conformations translate into different accessibilities to the central CA hexamer channel and have been hypothesized to be an important regulator of capsid core permeability to nucleotides necessary for reverse transcription of the viral genome [75].  [77]). One CA monomer is coloured in blue (NTD) and orange (CTD). (c) Comparison of immature CA-SP1 (lighter shade) and mature CA structure, aligned at the NTD (blue). (d) HIV-1 mature lattice organization in top view. The width and color of the sausage are directly proportional to the B-factor, from blue (−30) to red (−100). Details of the dimeric and trimeric interfaces are shown in red and blue dashes, respectively. (e) Intrinsic curvature of the mature CA hexamer by superposition of a planar CA hexamer (orange, PDB 4XFX [78]) and a highly curved hexamer (blue, PDB 6SKK [77]). (f) Tilt (left) and twist (right) angles between hexamers within a single core. Insets show a schematic illustration of tilt and twist angles [76]. The tilt/twist angle is indicated by the colour of the connecting lines between hexamer positions, from blue (less tilt along the long axis of the core) to red (more tilt along the circumference). (g) All-atom atomic model of theHIV-1 conical core (bottom) and a cross-section of three hexamers along the curved direction (top). The black line illustrates continuous curvature of the lattice given by both intra and inter-hexamer curvature, while the red line illustrates discrete curvature given by inter-hexamer curvature alone [77]. The lattice unit is marked with orange hexagon; pentamers shown in green [72].
The pentameric structure recently determined from native capsid by cryoET and subtomogram averaging is different from the previous crystal structure of the cross-linked pentamer [72,76]. Its NTDs are rotated by approximately 19 degrees compared to its hexameric counterpart. In doing so, it excludes helix 3 from the interface and forms a 10-helix bundle with its neighbouring hexamer instead. Moreover, the binding site for host factors and small molecules such as PF74 is more open at the pentamer NTD-CTD interface in comparison to the hexameric one [76]. The five arginine residues (R18) at the center of the pentamer in HIV CA (or the corresponding residue K17 in RSV CA) have been proposed to regulate the transition between hexamer and pentamer by balancing its electrostatic destabilization with stabilizing the lattice [38,79].
The CA domain of different viruses has very low sequence conservation; nonetheless, there is strong structural conservation between the CA proteins of different retroviruses [21]. This structurally conserved protein has a remarkable ability to accommodate different lattice curvatures. Recent cryoEM studies have shown that this ability is enabled by two features: the different tilt and twist orientations between CA hexamers [76], and more prominently by intrinsic curvatures of the hexamer itself (Figure 2e), enabled by the flexible linker between CA NTD and CTD [77] (Figure 2f,g). This curvature was also observed in the mature RSV capsid-like particles, as RSV has a highly variable core. This flexibility can adapt to the different requirements of the hexamers at various curvatures and a more random distribution of pentamers [38,80]. The curvature of the RSV capsid was thought to be derived from variable inter-capsomer interfaces; however, more recent structure data suggest the capsomers are intrinsically curved [38,77].

Future Perspective
Even with all the recent structural findings on the replication cycle of retroviruses, there are still many poorly understood processes. It is still unclear where Gag initially dimerizes, how it is transported to the assembly sites and how exactly maturation is triggered. The pathway of how the virus transitions from the immature to the mature capsid is also not completely understood. In addition, HIV-1 has served as the model organism to understand these concepts. Our understanding of other types of retroviruses has been catching up in recent years in terms of their mature and immature capsid architecture. Nonetheless, there is still much to be done to understand exactly how each virus regulates its assembly and maturation. Novel technical advances, such as correlative light and electron microscopy (CLEM) [81][82][83][84][85], cryoFIB lamella [86,87] cellular tomography [83,86], integrative imaging [83,88,89], and computational advances on subtomogram averaging [26,90] will play an important role filling these knowledge gaps in the near future.